1 基于R的网络分析

  • Visualizing netowrks with Powerpoint.
  • Introducing igraph pacage with R.
  • Using the Gephi.

2 Do not Forget Powerpoint

Xiangyu Chang, Danyang Huang, and Hansheng Wang. A Popularity Scaled Latent Space Model for Network Structure Formulation. Statistica Sinica (accepted). 2018

3 R package: igraph

  • igraph is a collection of network analysis tools with the emphasis on efficiency, portability and ease of use. igraph is open source and free. igraph can be programmed in R, Python and C/C++.

  • igraph has three basic functionalities.
    • Generating networks
    • Visualizing networks
    • Mining networks

4 Generating Networks

library(igraph)
g1 <- graph.empty()
g2 <- graph( c(1,2,2,3,3,4,5,6), directed=TRUE )
g3 <- graph.star(10, mode="out")
g4 <- graph.lattice(c(5,5))
g5 <- graph.lattice(length=5, dim=2)
g6 <- graph.ring(10)
g7 <- graph.tree(10, 2)
g8 <- graph.full(5, loops=TRUE)
g9 <- graph.full.citation(10)
g10 <- graph.atlas(sample(0:1252, 1))
el <- matrix( c("foo", "bar", "bar", "foobar"), nc=2, byrow=TRUE)
g11 <- graph.edgelist(el)
g12 <- graph.extended.chordal.ring(15, matrix(c(3,12,4,7,8,11), nr=2))

5 Visualization

  • plot(): plot does simple non-interactive 2D plotting to R devices.

  • tkplot(): does interactive 2D plotting using the tcltk package. It can only handle graphs of moderate size, a thousand vertices is probably already too many.

  • rglplot(): is an experimental function to draw graphs in 3D using OpenGL.

g2 <- graph( c(1,2,2,3,3,4,5,6), directed=TRUE )
plot(g2)

g3 <- graph.star(10, mode="out")
plot(g3)

g5 <- graph.lattice(length=5, dim=2)
plot(g5)

g6 <- graph.ring(10)
plot(g6)

g7 <- graph.tree(10, 2)
plot(g7)

g8 <- graph.full(5, loops=TRUE)
plot(g8)

g12 <- graph.extended.chordal.ring(15, matrix(c(3,12,4,7,8,11), nr=2))
plot(g12)

test <- read.csv('block4.csv', 
                 head = FALSE, stringsAsFactors = FALSE)
g <- graph.data.frame(test,directed = FALSE)
plot(g,vertex.size=5,layout=layout.fruchterman.reingold,vertex.shape='circle', vertex.label.cex=1.0, vertex.label.color='black', vertex.label=NA) 

#classic random graphs
g13 <- erdos.renyi.game(100,2/100,type='gnp')
plot(g13,layout=layout.fruchterman.reingold,
     vertex.size=5,vertex.label=NA)

#preferential attachment and variations
g14 <- barabasi.game(100)
plot(g14,layout=layout.fruchterman.reingold,
     vertex.size=5,vertex.label=NA,edge.arrow.size=0.1)

Plotting parameters

NODES 描述
vertex.color Node color
vertex.frame.color Node border color
vertex.shape One of “none”“circle”“square”“csquare”“rectangle”“crectangle”“vrectangle”“pie”“raster”“sphere”
vertex.size Size of the node (default is 15)
vertex.size2 The second size of the node (e.g. for a rectangle)
vertex.label Character vector used to label the nodes
vertex.label.family Font family of the label (e.g.“Times”, “Helvetica”)
vertex.label.font Font: 1 plain, 2 bold, 3, italic, 4 bold italic, 5 symbol
vertex.label.cex Font size (multiplication factor, device-dependent)
vertex.label.dist Distance between the label and the vertex
vertex.label.degree The position of the label in relation to the vertex where 0 right, “pi” is left, “pi/2” is below, and “-pi/2” is above
EDGES 描述
edge.color Edge color
edge.width Edge width, defaults to 1
edge.arrow.size Arrow size, defaults to 1
edge.arrow.width Arrow width, defaults to 1
edge.lty Line type, could be 0 or “blank”, 1 or “solid”, 2 or “dashed”, 3 or “dotted”, 4 or “dotdash”, 5 or “longdash”, 6 or “twodash”
edge.label Character vector used to label edges
edge.label.family Font family of the label (e.g.“Times”, “Helvetica”)
edge.label.font Font: 1 plain, 2 bold, 3, italic, 4 bold italic, 5 symbol
edge.label.cex Font size for edge labels
edge.curved Edge curvature, range 0-1 (FALSE sets it to 0, TRUE to 0.5)
arrow.mode Vector specifying whether edges should have arrows,possible values: 0 no arrow, 1 back, 2 forward, 3 both
OTHER 描述
margin Empty space margins around the plot, vector with length 4
frame if TRUE, the plot will be framed
main If set, adds a title to the plot
sub If set, adds a subtitle to the plot
plot(g14, edge.arrow.size=.2,vertex.color="red", vertex.size=8, vertex.frame.color="gray", vertex.label.color="black",vertex.label.cex=0.4, vertex.label.dist=2, edge.curved=0.2) 

6 Reading Network Data

6.1 DATASET 1: Edgelist

nodes <- read.csv("netscix2016/Dataset1-Media-Example-NODES.csv", header=T, as.is=T)
links <- read.csv("netscix2016/Dataset1-Media-Example-EDGES.csv", header=T, as.is=T)
head(nodes)
##    id               media media.type type.label audience.size
## 1 s01            NY Times          1  Newspaper            20
## 2 s02     Washington Post          1  Newspaper            25
## 3 s03 Wall Street Journal          1  Newspaper            30
## 4 s04           USA Today          1  Newspaper            32
## 5 s05            LA Times          1  Newspaper            20
## 6 s06       New York Post          1  Newspaper            50
head(links)
##   from  to weight      type
## 1  s01 s02     10 hyperlink
## 2  s01 s02     12 hyperlink
## 3  s01 s03     22 hyperlink
## 4  s01 s04     21 hyperlink
## 5  s04 s11     22   mention
## 6  s05 s15     21   mention
nrow(nodes); length(unique(nodes$id))
## [1] 17
## [1] 17
nrow(links); nrow(unique(links[,c("from", "to")]))
## [1] 52
## [1] 49
# Collapse multiple links of the same type between the same two nodes
# by summing their weights, using aggregate() by "from", "to", & "type":
# (we don't use "simplify()" here so as not to collapse different link types)
links <- aggregate(links[,3], links[,-3], sum)
links <- links[order(links$from, links$to),]
colnames(links)[4] <- "weight"
rownames(links) <- NULL

6.2 DATASET 2: Adjacent Matrix

nodes2 <- read.csv("netscix2016/Dataset2-Media-User-Example-NODES.csv", header=T, as.is=T)
links2 <- read.csv("netscix2016/Dataset2-Media-User-Example-EDGES.csv", header=T, row.names=1)
# Examine the data:
head(nodes2)
##    id   media media.type media.name audience.size
## 1 s01     NYT          1  Newspaper            20
## 2 s02    WaPo          1  Newspaper            25
## 3 s03     WSJ          1  Newspaper            30
## 4 s04    USAT          1  Newspaper            32
## 5 s05 LATimes          1  Newspaper            20
## 6 s06     CNN          2         TV            56
head(links2)
##     U01 U02 U03 U04 U05 U06 U07 U08 U09 U10 U11 U12 U13 U14 U15 U16 U17
## s01   1   1   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0
## s02   0   0   0   1   1   0   0   0   0   0   0   0   0   0   0   0   0
## s03   0   0   0   0   0   1   1   1   1   0   0   0   0   0   0   0   0
## s04   0   0   0   0   0   0   0   0   1   1   1   0   0   0   0   0   0
## s05   0   0   0   0   0   0   0   0   0   0   1   1   1   0   0   0   0
## s06   0   0   0   0   0   0   0   0   0   0   0   0   1   1   0   0   1
##     U18 U19 U20
## s01   0   0   0
## s02   0   0   1
## s03   0   0   0
## s04   0   0   0
## s05   0   0   0
## s06   0   0   0
# links2 is an adjacency matrix for a two-mode network:
links2 <- as.matrix(links2)
dim(links2)
## [1] 10 20
dim(nodes2)
## [1] 30  5

7 Mining Graph

  • graph atrribute

  • Computing features of graphs

  • Community Detection

  • Link Prediction

g <- barabasi.game(30)
degree(g)
>  [1] 13  5  4  1  1  1  5  1  1  2  1  2  1  1  2  1  1  1  1  1  1  2  1
> [24]  2  1  1  1  1  1  1
E(g)
> + 29/29 edges from 365327d:
>  [1]  2-> 1  3-> 1  4-> 2  5-> 1  6-> 1  7-> 1  8-> 1  9-> 7 10-> 2 11-> 3
> [11] 12-> 7 13-> 1 14-> 2 15->12 16-> 2 17-> 1 18->10 19-> 3 20->15 21-> 1
> [21] 22-> 7 23->22 24-> 1 25-> 3 26-> 1 27-> 7 28-> 1 29->24 30-> 1
V(g)
> + 30/30 vertices, from 365327d:
>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
> [24] 24 25 26 27 28 29 30
shortest.paths(g, v = 1)
>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
> [1,]    0    1    1    2    1    1    1    1    2     2     2     2     1
>      [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
> [1,]     2     3     2     1     3     2     4     1     2     3     1
>      [,25] [,26] [,27] [,28] [,29] [,30]
> [1,]     2     1     2     1     2     1
  • Centrality: closeness(), betweenness() and page.rank()

  • Community Detection: walktrap.community(), spinglass.community() and egde.betweenness.community()

  • Others

karate <- make_graph("Zachary")
wc <- cluster_walktrap(karate)
modularity(wc)
## [1] 0.3532216
membership(wc)
##  [1] 1 1 2 1 5 5 5 1 2 2 5 1 1 2 3 3 5 1 3 1 3 1 3 4 4 4 3 4 2 3 2 2 3 3
plot(wc, karate)

  • Xiangyong Cao, Xiangyu Chang and Yuewen Liu. Community Detection for Clustered Attributed Graphs via a Variational EM Algorithm. The 3rd ASE Conference series on Big Data Science and Computing, August 4-7, 2014.